Sains Malaysiana 52(10)(2023): 2971-2983

http://doi.org/10.17576/jsm-2023-5210-18

 

Classifying Severity of Unhealthy Air Pollution Events in Malaysia: A Decision Tree Model

(Mengelaskan Keparahan Kejadian Pencemaran Udara Tidak Sihat di Malaysia: Hasil Model Pokok Keputusan)

 

NURULKAMAL MASSERAN1,*, RAZIK RIDZUAN MOHD TAJUDDIN1 & MOHD TALIB LATIF2,3

 

1Department of Mathematical Sciences, Faculty of Science and Technology

Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia

2Department of Earth Sciences and Environment, Faculty of Science and Technology

Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia

3Department of Environmental Health, Faculty of Public Health, Universitas Airlangga, Surabaya, East Java 60115, Indonesia

 

Received: 16 June 2023/Accepted: 2 October 2023

 

Abstract

The application of data mining technique in dealing with real problems is popular and ubiquitous in various knowledge domains. This study proposes the concept of severity measures correspond to the characteristics of duration and intensity size for evaluating unhealthy air pollution events. In parallel with that, the present study also proposes a decision tree as a predictive model to deal with a binary classification corresponding to extreme and non-extreme unhealthy air pollution events, which is established based on threshold of the power-law behavior. In a similar vein, other characteristics, such as duration and intensity size, were also determined as important related features. A case study was conducted using the air pollution index data of Klang, Malaysia, from January 1st, 1997 to August 31st, 2020. The results found that the decision tree model can provide a high degree of precision and generalization with 100% accuracy in classifying a class for extreme and non-extreme events for the air pollution severity in the Klang area. In addition, a duration size is the most influential feature that leads to the occurrence of an extreme air pollution event. Thus, this study also suggests that authorities should exercise some vigilance precautions with respect to pollution incidents with a consecutive duration exceeding 11 hours.

 

Keywords: Air pollution classification; data mining; extreme air pollution; predictive model

 

Abstrak

Pengaplikasian teknik perlombongan data dalam menangani masalah dunia sebenar adalah popular dalam pelbagai domain pengetahuan. Kajian ini mengusulkan konsep ukuran keparahan sepadan dengan ciri tempoh masa dan saiz keamatan untuk menilai kejadian pencemaran udara yang tidak sihat. Selari dengan itu, kajian ini juga mengusulkan kaedah pokok keputusan sebagai model ramalan bagi kes pengelasan binari terhadap kejadian pencemaran udara tidak sihat yang melampau dan tidak melampau yang boleh dikenal pasti berdasarkan nilai ambang tingkah laku hukum-kuasa. Di samping itu, ciri lain iaitu tempoh masa dan saiz keamatan, juga dikenal pasti sebagai ciri berkaitan yang penting bagi suatu kes pencemaran udara. Dalam kajian ini, kajian kes telah dijalankan menggunakan data indeks pencemaran udara di Klang, Malaysia, dari  1 Januari 1997 hingga 31 Ogos 2020. Hasil kajian mendapati model pokok hasil dapat memberikan tahap ketepatan dan pengitlakan yang tinggi dengan ketepatan 100% dalam mengelaskan kelas bagi kejadian pencemaran melampau dan tidak melampau merujuk kepada keparahan suatu pencemaran udara di kawasan Klang. Selain itu, saiz tempoh masa dikenal pasti sebagai adalah ciri berpengaruh yang membawa kepada berlakunya kejadian pencemaran udara yang melampau. Oleh itu, kajian ini juga mencadangkan bahawa pihak berkuasa harus melaksanakan beberapa langkah berjaga-jaga jika kejadian pencemaran udara didapati berlaku dalam tempoh berturut-turut melebihi 11 jam.

 

Kata kunci: Model peramal; pencemaran udara melampau; pengelasan pencemaran udara; perlombongan data

 

REFERENCES

Agathokleous, E. & Saitanis, C.J. 2020. Plant susceptibility to ozone: A tower of Babel? Sci. Total Environ. 703: 134962.

Agathokleous, E., Feng, Z. & Saitanis, C.J. 2022. Effects of Ozone on Forests. In Handbook of Air Quality and Climate Change, edited by Akimoto, H. & Tanimoto, H. Singapore: Springer.

Aggarwal, C. 2015. Data Mining. Cham: Springer.

Al-Kindi, S.G., Brook, R.D., Biswal, S. & Rajagopalan, S. 2020. Environmental determinants of cardiovascular disease: Lessons learned from air pollution. Nat. Rev. Cardiol. 17: 656-672.

Bakar, M.A.A., Ariff, N.M., Bakar, S.A., Chi, G.P. & Rajendran, R. 2022. Air quality forecasting using temporal convolutional network (TCN) deep learning method. Sains Malaysiana 51(11): 3785-3793.

Bekesiene, S., Meidute-Kavaliauskiene, I. & Vasiliauskiene, V. 2021. Accurate prediction of concentration changes in ozone as an air pollutant by multiple linear regression and artificial neural networks. Mathematics 9(4): 356.

Boehmke, B. & Greenwell, B. 2020. Hands-on Machine Learning with R. Boca Raton: Chapman & Hall/CRC.

Breiman, L. 2001. Random Forests. Mach. Learn. 45: 5-32.

Breiman, L. 1996. Bagging predictors. Mach. Learn. 24: 123-140.

Breiman, L. 1984. Classification and Regression Tree. Boca Raton: Chapman & Hall/CRC.

Brønnum-Hansena, H., Bender, A.M., Andersen, Z.J., Sørensen, J., Bønløkke, J.H., Boshuizen, H., Becker, T., Diderichsen, F. & Loft, S. 2018. Assessment of impact of traffic-related air pollution on morbidity and mortality in Copenhagen Municipality and the health gain of reduced exposure. Environ. Int. 121(Part 1): 973-980.

Cabaneros, S.M., Calautit, J.K. & Hughes, B.R. 2019. A review of artificial neural network models for ambient air pollution prediction. Environ. Model. Softw. 119: 285-304.

Chang, L-Y. & Wang, H-W. 2006. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accid. Anal. Prev. 38(5): 1019-1027.

Chau, T.T. & Wang, K.Y. 2020. An association between air pollution and daily most frequently visits of eighteen outpatient diseases in an industrial city. Sci. Rep. 10: 2321.

Cohen, S., Rokach, L. & Maimon, O. 2007. Decision-tree instance-space decomposition with grouped gain-ratio. Inf. Sci. 177(17): 3592-3612.

Delen, D., Kuzey, C. & Uyar, A. 2013. Measuring firm performance using financial ratios: A decision tree approach. Expert Syst. Appl. 40(10): 3970-3983.

Department of Environment. 1997. A Guide to Air Pollutant Index in Malaysia (API). Kuala Lumpur: Ministry of Science, Technology and the Environment. https://aqicn.org/images/aqi-scales/malaysia-api-guide.pdf

Emberson, L. 2020. Effects of ozone on agriculture, forests and grasslands. Philos. Trans. Royal Soc. A. 378(2183): 20190327.

Feldman, D. & Gross, S. 2005. Mortgage default: Classification trees analysis. J. Real Estate Finan. Econ. 30: 369-396.

Friedman, J.H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5): 1189-1232.

Gin, O.K. 2009. Historical Dictionary of Malaysia. Lanham: Scarecrow Press.

Haldorai, A. & Ramu, A. 2021. Canonical correlation analysis based hyper basis feedforward neural network classification for urban sustainability. Neural Process. Lett. 53: 2385-2401.

Hodge, V. & Austin, J. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22: 85-126.

Hvidtfeldt, U.A., Severi, G., Andersen, Z.J., Atkinson, R., Bauwelinck, M., Bellander, T., Boutron-Ruault, M-C., Brandt, J., Brunekreef, B., Cesaroni, G., Chen, J., Concin, H., Forastiere, F., van Gils, C.H., Gulliver, J., Hertel, O., Hoek, G., Hoffmann, B., de Hoogh, K., Janssen, N., Jöckel, K.H., Jørgensen, J.T., Katsouyanni, K., Ketzel, M., Klompmaker, J.O., Krog, N.H., Lang, A., Leander, K., Liu, S., Ljungman, P.L.S., Magnusson, P.K.E., Mehta, A.J., Nagel, G., Oftedal, B., Pershagen, G., Peter, R.S., Peters, A., Renzi, M., Rizzuto, D., Rodopoulou, S., Samoli, E., Schwarze, P.E., Sigsgaard, T., Simonsen, M.K., Stafoggia, M., Strak, M., Vienneau, D., Weinmayr, G., Wolf, K., Raaschou-Nielsen, O. & Fecht, D. 2021. Long-term low-level ambient air pollution exposure and risk of lung cancer - A pooled analysis of 7 European cohorts. Environ. Int. 146: 106249.

James, G., Witten, D., Hastie, T. & Tibshirani, R. 2013. An Introduction to Statistical Learning with Application in R. New York: Springer.

Kamiran, F., Calders, T. & Pechenizkiy, M. 2013. Techniques for Discrimination-Free Predictive Models. In Discrimination and Privacy in the Information Society. Studies in Applied Philosophy, Epistemology and Rational Ethics, vol 3, edited by Custers, B., Calders, T., Schermer, B. & Zarsky, T. Berlin: Springer.

Kow, P-Y., Chang, L-C., Lin, C-Y., Chou, C.C-K. & Chang, F-J. 2022.  Deep neural networks for spatiotemporal PM2.5 forecasts based on atmospheric chemical transport model output and monitoring data. Environ. Pollut. 306: 119348.

Kumar, S., Mishra, A.K. & Choudhary, B.S. 2022. Prediction of back break in blasting using random decision trees. Eng. Comput. 38: 1185-1191.

Lantz, B. 2019. Machine Learning with R: Expert Techniques for Predictive Modeling. 3rd ed. Birmingham: Packt Publishing.

Lanzi, E., Dellink, R. & Chateau, J. 2018. The sectoral and regional economic consequences of outdoor air pollution to 2060. Energy Econ. 71: 89-113.

Lu, J.G. 2020. Air pollution: A systematic review of its psychological, economic, and social effects. Curr. Opin. Psychol. 32: 52-65.

Maimon, O. & Rokach, L. 2009. Introduction to knowledge discovery and data mining. In Data Mining and Knowledge Discovery Handbook, edited by Maimon, O. & Rokach, L. Boston: Springer.

Maji, S., Ghosh, S. & Ahmed, S. 2018. Association of air quality with respiratory and cardiovascular morbidity rate in Delhi, India. Int. J. Environ. Health Res. 28(5): 471-490.

Malik, S., Kanwal, N., Asghar, M.N., Sadiq, M.A.A., Karamat, I. & Fleury, M. 2019. Data driven approach for eye disease classification with machine learning. Appl. Sci. 9: 2789.

Masseran, N. 2022a. Power-law behaviors of the severity of unhealthy air pollution events. Nat. Hazards 112: 1749-1766.

Masseran, N. 2022b. Multifractal characteristics on multiple pollution variables in Malaysia. Bull. Malaysian Math. Sci. Soc. 45: 325-344.

Masseran, N. 2021a. Power-law behaviors of the duration size of unhealthy air pollution events. Stoch. Environ. Res. Risk Asses. 35: 1499-1508.

Masseran, N. 2021b. Modeling the characteristics of unhealthy air pollution events: A copula approach. Int. J. Environ. Res. Public Health 18(16): 8751.

Masseran, N. 2017. Modeling fluctuation of PM10 data with existence of volatility effect. Environ. Eng. Sci 34(11): 816-827.

Masseran, N. & Safari, M.A.M. 2020. Risk assessment of extreme air pollution based on partial duration series: IDF approach. Stoch. Environ. Res. Risk Asses. 34: 545-559.

Masui, N., Agathokleous, E., Mochizuki, T., Tani, A., Matsuura, H. & Koike, T. 2021. Ozone disrupts the communication between plants and insects in urban and suburban areas: An updated insight on plant volatiles. J. For. Res. 32: 1337-1349.

McCarthy, R.V., McCarthy, M.M., Ceccucci, W. & Halawi, L. 2019. Applying Predictive Analytics. Cham: Springer.

Mustakim, N.A., Ul-Saufie, A.Z., Shaziayani, W.N., Mohamad Noor, N. & Mutalib, S. 2023. Prediction of daily air pollutants concentration and air pollutant index using machine learning approach. Pertanika J. Sci. & Technol. 31(1): 123-135.

Myles, A.J., Feudale, R.N., Liu, Y., Woody, N.A. & Brown, S.D. 2004. An introduction to decision tree modeling. J. Chemom. 18(6): 275-285.

Ndong, G.O., Villerd, J., Cousin, I. & Therond, O. 2021. Using a multivariate regression tree to analyze trade-offs between ecosystem services: Application to the main cropping area in France. Sci. Total Environ. 764: 142815.

Ouyang, X., Shao, Q., Zhu, X., He, Q., Xiang, C. & Wei, G. 2019. Environmental regulation, economic growth and air pollution: Panel threshold analysis for OECD countries. Sci. Total Environ. 657: 234-241.

Putra, F.M. & Sitanggang, I.S. 2020. Classification model of air quality in Jakarta using decision tree algorithm based on air pollutant standard index. IOP Conf. Ser.: Earth Environ. Sci. 528: 012053.

Raileanu, L.E. & Stoffel, K. 2004. Theoretical comparison between the Gini Index and information gain criteria. Ann. Math. Artif. Intell. 41: 77-93.

Rizvi, S., Rienties, B. & Khoja, S.A. 2019. The role of demographics in online learning; A decision tree based approach. Comput. Educ. 137: 32-47.

Rokach, L. & Maimon, O. 2015. Data Mining with Decision Trees: Theory and Applications. 2nd ed. Singapore: World Scientific Publishing.

Rokach, L. & Maimon, O. 2009. Classification trees. In Data Mining and Knowledge Discovery Handbook, edited by Maimon, O. & Rokach, L. Boston: Springer.

Rokach, L. & Maimon, O. 2005. Decision trees. In Data Mining and Knowledge Discovery Handbook, edited by Maimon, O. & Rokach, L. Boston: Springer.

Rokach, L. & Maimon, O. 2005. Top-down induction of decision trees classifiers - A survey. IEEE Trans. Syst. Man. Cybern. B Cybern. 35(4): 476-487.

Sanyal, S., Rochereau, T., Maesano, C.N., Com-Ruelle, L. & Annesi-Maesano, I. 2018. Long-term effect of outdoor air pollution on mortality and morbidity: A 12-year follow-up study for metropolitan France. Int. J. Environ. Res. Public Health 15(11): 2487.

Sarkhosh, M., Najafpoor, A.A., Alidadi, H., Shamsara, J., Amiri, H., Andrea, T. & Kariminejad, F. 2021. Indoor air quality associations with sick building syndrome: An application of decision tree technology. Build. Environ. 188: 107446.

Schapire, R.E. & Freund, Y. 2013. Boosting: Foundations and Algorithms. Kybernetes 42(1): 164-166.

Schraufnagel, D.E., Balmes, J.R., Cowl, C.T., Matteis, S.D., Jung, S-H., Mortimer, K., Perez-Padilla, R., Rice, M.B., Riojas-Rodriguez, H., Sood, A., Thurston, G.D., To, T., Vanker, A. & Wuebbles, D.J. 2019. Air pollution and noncommunicable diseases: A review by the Forum of International Respiratory Societies’ Environmental Committee, Part 2: Air pollution and organ systems. CHEST 155(2): 417-426.

Shaziayani, W.N., Ul-Saufie, A.Z., Mutalib, S., Mohamad Noor, N. & Zainordin, N.S. 2022. Classification prediction of PM10 concentration using a tree-based machine learning approach. Atmosphere 13: 538.

Tan, P-G., Steinbach, M., Karpatne, A. & Kumar, V.  2019. Introduction to Data Mining. 2 ed. Boston: Pearson Education.

Tileubai, A., Tsend, J., Oyunbileg, B-E., Luvsantseren, P., Luvsan-Ish, A., Chilhaasuren, B., Puntsagdash, J., Chuluunbaatar, G. & Tsagaan, B. 2023. Study of decision tree algorithms: Effects of air pollution on under five mortality in Ulaanbaatar. BMJ Health Care Inform. 30: e100678.

Thongtip, S., Srivichai, P., Chaitiang, N. & Tantrakarnapa, K. 2022. The influence of air pollution on disease and related health problems in Northern Thailand. Sains Malaysiana 51(7): 1993-2002.

Wang, C., Feng, L. & Chen, K. 2019. The impact of ambient particulate matter on hospital outpatient visits for respiratory and circulatory system disease in an urban Chinese population. Sci. Total Environ. 666: 672-679.

Wang, N., Mengersen, K., Tong, S., Kimlin, M., Zhou, M., Wang, L., Yin, P., Xua, Z., Cheng, J., Zhang, Y. & Hu, W. 2019. Short-term association between ambient air pollution and lung cancer mortality. Environ. Res. 179(Part A): 108748.

Zalakeviciute, R., Bastidas, M., Buenaño, A. & Rybarczyk, Y.A. 2020. Traffic-based method to predict and map urban air quality. Appl. Sci. 10: 2035.

Zhang, Y., Zhang, R., Ma, Q., Wang, Y., Wang, Q., Huang, Z. & Huang, L. 2020. A feature selection and multi-model fusion-based approach of predicting air quality. ISA Trans. 100: 210-220.

Zhao, C-N., Xu, Z., Wu, G-C., Mao, Y-M., Liu, L-N., Wu, Q., Dan, Y-L., Tao, S-S., Zhang, Q., Sam, N.B., Fan, Y-G., Zou, Y-F., Ye, D-Q. & Pan, H-F. 2019. Emerging role of air pollution in autoimmune diseases. Autoimmun. Rev. 18(6): 607-614.

Zhao, H., Zheng, Y. & Wu, X. 2018. Assessment of yield and economic losses for wheat and rice due to ground-level O3 exposure in the Yangtze River Delta, China. Atmos. Environ. 191: 241-248.

Zhao, H., Zhang, Y., Qi, Q. & Zhang, H. 2021. Evaluating the impacts of ground-level O3 on crops in China. Curr. Pollution Rep. 7: 565-578.

 

*Corresponding author; email: kamalmsn@ukm.edu.my

 

 

 

 

 

 

 

 

 

 

 

previous